38        Bioinformatics

-p 80 \

-o filtered.fastq \

-Q33

fastqc filtered.fastq

firefox filtered_fastqc.html

In the above script, “-i” option specifies the input FASTQ file, “-q” specifies the minimum

Phred quality threshold, “-p” specifies the percentage of bases of the reads that have at least

the specified threshold quality, “-o” specifies a name of the output FASTQ file where the

filtered reads are stored, and “-Q33” is to tell the program that the FASTQ quality encod-

ing is Phred+33 (the default is “-Q64”; therefore, we must use “-Q33” for FASTQ files with

Illumina 1.9 encoding or later).

Figure 1.32 shows the per base sequence quality graph of the filtered FASTQ file. The

filtering process removed 499,970 reads, which did not meet the criteria. The per base

sequence quality, which is the most important metric, has been improved and per base

sequence content has been also improved. However, some positions at the ends of the reads

have still low Phred quality scores. We can trim the low-quality bases from the ends of the

reads by using the “fastq_quality_trimmer” program. Instead of removing the reads that

FIGURE 1.32  A graph of the filtered “bad.fastq” file with low-quality bases at the read ends.